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Considerable progress in satellite remote sensing (SRS) of dust particles has been 
seen in the last decade. From an environmental health perspective, such an event 
detection, after linking it to ground particulate matter (PM) concentrations, can proxy 
acute exposure to respirable particles of certain properties (i.e. size, composition, and 
toxicity). Being affected considerably by atmospheric dust, previous studies in the 
Eastern Mediterranean, and in Israel in particular, have focused on mechanistic and 
synoptic prediction, classification, and characterization of dust events. In particular, 
a scheme for identifying dust days (DD) in Israel based on ground PMi 0 (particu- 
late matter of size smaller than 10 p,m) measurements has been suggested, which has 
been validated by compositional analysis. This scheme requires information regarding 
ground PMio levels, which is naturally limited in places with sparse ground-monitor- 
ing coverage. In such cases, SRS may be an efficient and cost-effective alternative to 
ground measurements. This work demonstrates a new model for identifying DD and 
non-DD (NDD) over Israel based on an integration of aerosol products from differ- 
ent satellite platforms (Moderate Resolution Imaging Spectroradiometer (MODIS) and 
Ozone Monitoring Instrument (OMI)). 

Analysis of ground-monitoring data from 2007 to 2008 in southern Israel revealed 
67 DD, with more than 88% occurring during winter and spring. A Classification and 
Regression Tree (CART) model that was applied to a database containing ground mon- 
itoring (the dependent variable) and SRS aerosol product (the independent variables) 
records revealed an optimal set of binary variables for the identification of DD. These 
variables are combinations of the following primary variables: the calendar month, 
ground-level relative humidity (RH), the aerosol optical depth (AOD) from MODIS, 
and the aerosol absorbing index (AAI) from OMI. A logistic regression that uses these 
variables, coded as binary variables, demonstrated 93.2% correct classifications of DD 
and NDD. Evaluation of the combined CART-logistic regression scheme in an adja- 
cent geographical region (Gush Dan) demonstrated good results. Using SRS aerosol 
products for DD and NDD, identification may enable us to distinguish between health, 
ecological, and environmental effects that result from exposure to these distinct particle 
populations. 


1. Introduction 

Several regions are known to be sources for dust resuspension, including northeastern 
and central Asia, north Africa, Saudi Arabia, Sudan, Chad, and central Australia (Dayan 
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et al. 2007; Ganor et al. 2010; Ochirkhuyag and Tsolmon 2008; Washington et al. 2003). 
Airborne dust has a wide environmental impact, including effects on visibility, radiative 
forcing, and the Earth’s energy balance (Goudie and Middleton 2006; Washington et al. 
2003). Being affected considerably by atmospheric dust, previous studies in the Eastern 
Mediterranean, and in Israel, in particular, focused on mechanistic (Alpert et al. 2002) 
and synoptic (Alpert et al. 2004) identification, characterization, prediction, and classifi- 
cation of dust events. The spectrum of tools used in these studies include mineralogical 
and chemical characterization of dust particles (Erel et al. 2006; Ganor, Stupp, and Alpert 
2009; Kalderon-Asael et al. 2009), estimation of particle transboundary transport and fate 
(Rudich et al. 2008), and analysis of satellite observations (Carmona and Alpert 2009). 
Days heavily affected by dust, termed hereafter dust days (DD), were found to occur during 
certain synoptic patterns (Alpert et al. 2004; Dayan et al. 2007) and to be characterized by 
specific meteorological conditions (Ganor et al. 2010). These conditions carry with them 
information regarding the origin of the dust and therefore its related attributes, which can 
affect the health of the exposed individuals. In particular, during their transport, dust par- 
ticles can absorb airborne pollutants such as metals and volatile organic compounds (Erel 
et al. 2006; Falkovich et al. 2004) that modify their mineral composition and consequently 
their supposedly harmlessness nature. Indeed, the literature on health effects from exposure 
to dust is inconsistent. Whereas some studies report non-detrimental effects from exposure 
to crustal (Laden et al. 2000) and dust (Prospero et al. 2008; Schwartz et al. 1999) particles, 
other studies report clear evidence of adverse health effects (cf. Jimenez et al. 2010; Lipsett 
et al. 2006; Middleton et al. 2008). 

To date, identification and characterization of DD in Israel have been based mainly 
on ground particulate matter (PM) observations and compositional analysis (Dayan et al. 
2007; Ganor, Stupp, and Alpert 2009). Following Kaufman et al. (2005), who introduced 
the use of satellite-borne data to distinguish dust aerosols over the ocean, the last decade 
has seen considerable progress in using satellite remote sensing (SRS) for retrieving reli- 
able data on dust aerosols (cf. Christopher and Jones 2010) and for developing different 
techniques for utilizing data from the Moderate Resolution Imaging Spectroradiometer 
(MODIS) and Ozone Monitoring Instrument (OMI) for dust detection. Examples include 
the use of satellite data and imagery for studying dust events and their broad environmental 
effects over the Australian (Baddock, Bullard, and Bryant 2009) and Indian (Badarinath 
et al. 2010) subcontinents, the Persian Gulf, northwestern China, and the USA (Huang 
et al. 2010). Environmental health studies, however, mostly use risk metrics that are based 
on ground air quality data. To overcome the sparse and heterogeneous spatial distribu- 
tion of ground-monitoring stations, satellite-borne observations of the aerosol optical depth 
(AOD) through atmospheric columns have been suggested as a proxy of ground-level PM. 
Assessing the relationships between AOD and surface PM is an active research area (e.g. 
Engel-Cox et al. 2004; Hutchison, Smith, and Faruqui 2005; Lee et al. 2011; Pacioreck 
et al. 2008; Van Donkelaar et al. 2010), which has recently been utilized also in epidemio- 
logical studies (Hu 2009; Hu and Rao 2009). In particular, the use of AOD retrievals from 
the MODIS instruments on board Terra and Aqua polar-orbiting satellites for environmen- 
tal health applications is currently explored due to the AOD data spatial coverage, temporal 
resolution (almost daily global coverage), and availability. Satellite imagery is a useful tool 
for identifying specific aerosol events, such as large biomass fires, volcanic ash, smoke, 
and thick haze (Hoff and Christopher 2009; Martin 2008; Van Donkelaar et al. 2011). 
However, an efficient DD identification scheme for analysing more than a few specific 
days/events, e.g. for comprehensive environmental health studies, has not been explored to 
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date. This study focuses on developing a model for retrospective identification of days with 
considerable dust concentrations using almost solely satellite-borne aerosol products. 

A scheme for identifying DD in Israel, based on ground PM 10 (particulate matter of 
size smaller than 10 |xm) measurements, was suggested and validated by PM composi- 
tional analysis by Ganor, Stupp, and Alpert (2009). Their criterion to assess, retrospectively, 
whether a given day was a dust day (a day characterized by a dominant particulate min- 
eral fraction) is if at least three consecutive hours (six successive half-hourly readings) 
of PM 10 records were above 100 1 1 m m -3 , with the highest value above 180 (im mb 
Naturally, this scheme requires information on ground PMio concentrations and is clearly 
limited, therefore, to places with ground-monitoring coverage. For example, ground PM 
monitoring in urban areas in Israel is fairly dense, with an average interstation separation 
of -5 km, whereas in rural areas it is rather sparse (Figure 1). In such areas, it would have 
been useful if SRS aerosol products could be used to identify DD and non-DD (NDD). 
Discriminating between these two populations has merit for environmental health studies 
due to the potentially distinct toxicity of the particles and for air quality management when 
abatement measures are sought as a response to recurrent exceedances. In this work, we 
present a scheme that may enable epidemiologists, environmental scientists, and ecologists 
to explore the distinct effects of dust and non-dust particles on expanded temporal scales. 


2. Data 

2.1. Ground observations 

tJalf-hourly concentrations of PMio and PM 2.5 (particulate matter of size smaller than 
2.5 |im) from 2007 to 2008, gathered by the regional air quality monitoring network in 
southern Israel (Figure 1), were used. The typical instrument error is ±1% (Yuval and 
Broday 2006). The PMio data were used for (a) identifying DD and NDD in the training 
sets and (b) evaluating the results of the logistic regression model when it operated on the 



Figure 1. Locations of PM monitoring stations in the study area in southern and central Israel. 
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test data sets (see Section 3). Reliable relative humidity (RH) data were obtained from a 
ground meteorological station situated near one of the air quality monitoring stations. The 
RH data were used at the model evaluation stage (see Section 3). Characteristics of all 
the DD observed during the study period were studied by applying Alpert et al.’s (2004) 
semi-objective classification of daily synoptic patterns based on the National Centers 
for Environmental Prediction National Center for Atmospheric Research (NCEP/NCAR) 
reanalysis data and by a careful inspection of back-trajectories of air masses using the 
Hybrid Single Particle Lagrangian Integrated Trajectory (HYSPLIT) model (http://ready. 
arl.noaa.gov). 

2.2. Satellite data 

The Terra and Aqua satellites pass over the study area (southern Israel: 31.52° N— 31.91° 
N, 34.5° E— 34.85° E) between 09:30 and 11:30 and 12:00 and 14:00 UTC, respectively. 
MODIS collection 5.1 data (Levy et al. 2009; http://modis-atmos.gsfc.nasa.gov) from 
1 January 2007 to 3 1 December 2008 from both platforms were used. The MODIS products 
that were used include (a) the AOD at 550 nm, retrieved using the operational dark target 
(DT) algorithm, (b) the Angstrom exponent (AE), derived from the DT-AOD retrievals at 
470 and 670 nm over the land, and (c) the single scattering albedo (SSA, the ratio of the 
aerosol scattering to its extinction coefficients) at 470 nm. Typically, for a given pair of 
wavelengths, the AE decreases as the particle size increases and takes values ranging from 
more than 1.5 for fine particles like those formed during combustion processes to nearly 
zero for coarse dust particles (Kaskaoutis et al. 2007). Similarly, the SSA has been used as 
a key parameter for defining the aerosol optical properties and for classification of aerosol 
types (Meloni et al. 2006). However, as will be shown, due to poor availability and small 
variability of the SSA in this study, it turned out to be inadequate for DD identification. 

MODIS Level 2 aerosol products with a spatial (grid) resolution of 10 km x 10 km 
and quality flags of ‘good’ and ‘very good’ (QA = 2 and QA = 3, respectively) were 
used. Ichoku et al. (2002) suggested that 5 pixel x 5 pixel averaging of MODIS AOD is 
spatially correlated with hourly averages of AOD, r, observed by ground sunphotometers 
(AERONET), with a global uncertainty of the MODIS DT-AOD relative to the AERONET 
AOD of At = ± (0.05 + 0.15 t) (Levy et al. 2010). Since the correlation coefficients 
between the AERONET AOD from the Ness-Ziona site, Israel, and the MODIS DT-AOD 
from Aqua and Terra were 0.87 and 0.89, respectively, and as the expected AOD error was 
within the global uncertainty range, the quality of the MODIS DT-AOD used in this study 
was confirmed. Furthermore, based on 7 year data, Kaufman et al. (2000) concluded that 
the aerosol products from MODIS ‘instantaneous’ overpass highly correlate with the daily 
average AOD. Hence, the daily average aerosol products were used throughout this study. 

OMI, on board Aura, measures the Earth reflectance spectra in both visible (VIS) and 
ultraviolet (UV) (270-500 nm) spectral bands and can distinguish between UV-absorbing 
aerosols, such as desert dust, and weakly UV-absorbing aerosols and clouds (Kazadzis et al. 
2009; Stammes and Noordhoek 2002). The aerosol absorbing index (AAI) is derived from 
the change in the spectral dependence of backscattered UV radiance by aerosols relative 
to Rayleigh scattering in the 354-388 nm spectral range. The AAI was found to be a use- 
ful indicator of elevated concentrations of UV-absorbing aerosols, such as dust (Jethva and 
Toress 2011), taking a near-zero value for clouds and weakly absorbing aerosols and a pos- 
itive value for desert aerosols (Huang et al. 2010). Aura observes the study area between 
09:00 and 1 1:00 UTC with a nearly daily pass. Two years (1 January 2007-31 December 
2008) of OMI AAI data (OM-AURA_L2) with spatial resolution of 13 km x 24 km 
were used. 
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3. Methods 

Using half-hourly ground PMio records from 2007 to 2008, we assembled a list of DD and 
NDD based on a modification of Ganor, Stupp, and Alpert’s (2009) scheme for DD iden- 
tification. Since dust events have a considerable spatial extent (Yuval and Broday 2006), 
we modified Ganor et al.’s scheme, requiring that the conditions for DD identification (see 
Section 1) occur in at least three nearby monitoring stations simultaneously (or within a 
very short lag time). The DD list was scrutinized using (a) HYSPLIT back- trajectories of 
the air masses, (b) Alpert et al.’s (2004) semi-objective synoptic classification, and (c) a 
careful inspection of the synoptic maps on the DD. 

The differences between the two populations (DD and NDD) in seasonality, synoptic 
class frequency, daily average ground-monitoring parameters (PM 2 . 5 , RH, etc.), and the 
SRS aerosol products were examined. The distinct differences between these two pop- 
ulations (see Section 4) supported the development of a DD/NDD classification model 
(Figure 2). As a first step, the best discrimination rules for distinguishing between DD and 
NDD were obtained using the nonparametric Classification and Regression Tree (CART) 
algorithm (Breiman et al. 1984). CART has been used to identify the potential causal rela- 
tionships in a variety of environmental data sets (e.g. Flu et al. 2008; Rothwell, Futter, 
and Dise 2008; Sullivan et al. 2006). It has also been applied to the entire study database, 
which included a categorical seasonality variable (month) and daily SRS aerosol products: 
AOD, AE, and SSA (from MODIS) and AAI (from OMI). This database is designated 
hereafter as DB1. The binary response variable ‘dust’ takes the values dust = 1 for DD and 
dust = 0 for NDD. The output of this step was the best set of explanatory factors (rules) 
that were associated with dust = 1 . These factors were used to transform DB 1 into a binary 
database (DB2), which was used to develop a logistic regression model for identifying DD. 
The CART model was applied using ‘R’ software (R Development Core Team 2009). The 
CART algorithm can specify prior information to the outcome probabilities and use it for 
building the tree. We examined to what extent prior information on the 5 year mean ratio 
of DD/NDD in the study area (~ 1 : 1 0) modifies the CART selection of optimal factors for 
DD classification. A cross-validation procedure was applied for estimating the misclassifi- 
cation rates. The final partitioning of the data was determined using the tree that reveals the 
smallest cross-validation estimation error (Breiman et al. 1984). 

Occurrence of DD ( dust =1) was modelled by a logistic regression of the form 


where P is the probability of DD occurrence and the x’s are the variables obtained by 
the CART (see Table 5). Equation (1) has been parameterized by regressing it against a 
training subset that was constructed by randomly selecting 80% of each population (DD 
and NDD) from DB2. Subsequently, evaluation of the logistic model was performed using a 
test subset, i.e. the remaining 20% of DB2. This procedure (i.e. model parameterization and 
evaluation) was repeated 10 times for 10 different random selections of the training and test 
data sets. For each record in the test subsets, the probability, P, that dust = 1 was calculated. 
Each observation in the test set was classified as a dust day if its calculated probability, 
P, was higher than a given threshold. This threshold was determined after examining the 
receiver operating characteristic (ROC) curves that were obtained for each run. Each point 
on the ROC curve represents a sensitivity/specificity pair that corresponds to a particular 
discrimination threshold (Bradely 1997; Fawcatt 2005; Zweig and Campbell 1993). It is 
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noteworthy that the true DD and NDD populations were obtained using Ganor, Stupp, and 
Alpert’s (2009) modified scheme and served as the evaluation list against which the model 
could be tested. Since DD were found to be associated with certain RH profiles (Falkovich 
et al. 2004; Ganor et al. 2010), we examined also whether the use of surface RH data, if 
available, as an additional explanatory variable can improve the performance of the model. 


4. Results 

An analysis of the half-hourly ground PMio monitoring data from January 2007 to 
December 2008 revealed 67 DD and 664 NDD. Most DD occurred in spring (March to 
May, 57%) and winter (December to February, 31%) in agreement with previous find- 
ings (Dayan and Levy 2005; Dayan et al. 2007; Erel et al. 2006). A meticulous analysis 
of synoptic maps for the DD supported our classification of the 67 DD using the modi- 
fied Ganor scheme. Based on Alpert et al.’s (2004) semi-objective synoptic classification 
scheme, the synoptic distribution of these populations (Figure 3, Table 1 ) demonstrates that 
DD occurred mostly on days characterized by Sharav lows (cyclones that form along the 
North African and the southern Mediterranean coastline) and lows to the north and to the 
west (e.g. Cyprus lows), whereas Persian troughs and highs to the west were more common 
on NDD. However, neither DD nor NDD were characterized by one prevailing synoptic 
class. This conclusion holds also when the 19 synoptic classes are pooled into six dominant 
synoptic patterns (Red Sea trough, Persian trough, highs, winter lows, lows to the east, and 
Sharav low), it is noteworthy that since synoptic systems move more slowly than the mea- 
sured winds, the synoptic classification may not be always synchronized with actual dust 
occurrences. This limits the applicability of synoptic classification for DD identification 
and prediction, especially if the dust event does not occur close to the synoptic classifi- 
cation time, i.e. 12:00 UTC. This conclusion is in general agreement with the findings of 
Dayan et al. (2007), Carmona and Alpert (2009), and Ganor et al. (2010). 



Synoptic class 


Figure 3. Distribution of DD and NDD by synoptic class in 2007-2008. 
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Table 1. Synoptic classes and their synoptic pattern category in the Eastern Mediterranean 
(following Alpert et al. 2004). 



Synoptic class (with number) 


Synoptic pattern (with number) 

1 

Red Sea trough with eastern axis 

I 

Red Sea trough 

2 

Red Sea trough with western axis 

I 

Red Sea trough 

3 

Red Sea trough with central axis 

I 

Red Sea trough 

4 

Persian trough (weak) 

II 

Persian trough 

5 

Persian trough (medium) 

II 

Persian trough 

6 

Persian trough (deep) 

II 

Persian trough 

7 

High to the east 

III 

Highs 

8 

High to the west 

III 

Highs 

9 

High to the north 

III 

Highs 

10 

High over Israel (central) 

III 

Highs 

11 

Low to the east (deep) 

IV 

Low to the east 

12 

Cyprus low to the south (deep) 

V 

Winter lows 

13 

Cyprus low to the south (shallow) 

V 

Winter lows 

14 

Cyprus low to the north (deep) 

V 

Winter lows 

15 

Cyprus low to the north (shallow) 

V 

Winter lows 

16 

Cold low to the west 

V 

Winter lows 

17 

Low to the east (shallow) 

IV 

Low to the east 

18 

Sharav low to the west 

VI 

Sharav low 

19 

Sharav low over Israel (central) 

VI 

Sharav low 


Studying the distribution of different ground-monitoring parameters in the two pop- 
ulations, Figure 4 depicts that the PM 2 . 5 /PM 10 ratio has a significantly narrower range 
and a smaller median and mean on DD than on NDD (Table 2). Similar results were also 
observed in the Haifa Bay area, Israel, with mean PM 2 . 5 /PM 10 ratios of 0.36 and 0.65 on 
DD and NDD, respectively (HDMAE 2008). This result supports previous findings that 
high background PM 2.5 concentrations (mainly sulphate transported from Eastern Europe) 
characterize NDD in the East Mediterranean, whereas DD are dominated by coarser size 
aerosols (Asaf et al. 2008). Table 2 reveals that the differences between DD and NDD pop- 
ulations are statistically significant. In particular, the difference in ambient RH between 
the two populations results from the lower RH of desert-borne air masses compared to the 
higher RH of air masses on NDD (Ganor et al. 2010). 

Among the satellite-borne remotely sensed parameters studied, only AOD and AE 
showed significant differences between the two populations (Table 3), suggesting that these 
parameters may be useful for DD identification based on SRS aerosol products. However, 
it is notable that although DD and NDD were characterized as having a significantly differ- 
ent AE, its maximum value was lower than 1 in both populations even if AE > 1 could be 
expected on NDD. Moreover, although only marginal differences were observed in the SSA 
and the AAI between DD and NDD, the tails of their pertinent distributions (especially for 
the SSA) were different in the two populations (Figure 5). These observations demonstrate 
the difficulties when the input to the DD classification model is obtained by a subjective 
choice of variables. 

Tables 2 and 3 and Figure 5 depict that DD are characterized by high variability of both 
ground and satellite-borne parameters, but that it may be possible to use SRS aerosol prod- 
ucts for DD identification. However, relative to the continuous ground-monitoring data, 
AOD retrievals at any given location can be obtained only once or twice per day and require 
cloud-free conditions. In particular, the limited availability of the SSA (Table 4) during the 
study period severely affected the possibility of using it within a DD classification model. 
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Figure 4. Ratio of ground PM 2 ,5 to PMio concentrations on DD (top) and NDD (bottom) in southern 
Israel, 2007-2008. 


Table 2. Statistics of daily mean ground-monitoring attributes on DD (67 cases) and NDD 
(664 cases) in the years 2007-2008. 



First— third quartiles 

Median 

p- Value* 

DD 

NDD 

DD 

NDD 

PM 10 (pig m 3 ) 

106.3-176.4 

30.3—47.4 

124.43 

38.94 

<0.001 

PM 2 . 5 (Rg m 3 ) 

28.7-49.5 

14.2-23.7 

36.48 

18.79 

<0.001 

pm 2 . 5 /pm 10 

0.23-0.33 

0.40-0.57 

0.27 

0.48 

<0.001 

RH 

54.5-74.6 

64.6-77.7 

64 

72.22 

<0.001 


Note: * Mann- Whitney test. 


Table 3. Statistics of daily mean SRS parameters on DD (67 cases) and 
NDD (664 cases) in the years 2007-2008. 


First— third quartiles Median 



DD 

NDD 

DD 

NDD 

p- Value* 

AOD 

0.29-0.67 

0.19-0.35 

0.46 

0.25 

<0.001 

AE 

0.55-0.62 

0.60-0.63 

0.59 

0.62 

<0.001 

SSA 

0.91-0.93 

0.91-0.94 

0.92 

0.93 

0.09 

AAI 

0.93-1.76 

0.86-1.57 

1.38 

1.22 

0.07 


Note: * Mann-Whitney test. 
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Figure 5. Box plots of the distribution of AAI (left) and SSA (right) on DD and NDD in the study 
area in 2007-2008. 

Table 4. Availability of SRS aerosol product over the study area in 2007-2008. 



Data availability (%) 


Sensor 

Algorithm 

Retrieved parameters 

All days 

DD 

NDD 

MODIS (Terra and Aqua) 

Dark target (DT) 

Aerosol optical depth (AOD) 

73 

87 

71 



Angstrom exponent (AE) 

72 

84 

71 


Deep blue (DB) 

Aerosol optical depth (AOD) 

55 

43 

58 



Angstrom exponent (AE) 

55 

43 

58 



Single scattering albedo (SSA) 

20 

22 

20 

OMI (Aura) 


Aerosol absorbing index (AAI) 

84 

94 

83 


Table 5. CART model output ‘rules’ that characterize DD. 


CART 

‘rules’ 

Parameter name 
in DB2 


Month > 6 
AAI < 0.03 

Xl 


Month < 6 
AOD > 0.3160 

*2 


Month :> 6 
AOD > 0.5636 
AAI > 0.03 

x 3 


Month < 4 

0.2077 < AOD < 0.3160 

x 4 


Note: The model uses only SRS aerosol products and the calendar month as input data (DB 1 ). 


Table 5 details the rules obtained when applying the CART model on DB1, i.e. when 
accounting for SRS aerosol products and the calendar month as the only potential explana- 
tory variables for predicting the occurrence of ground-perceived DD. The four rules found 
by the CART algorithm were transformed into binary variables, x,, i = 1 ... 4, and used 
in the logistic regression. Table 6 presents the results of one logistic regression (out of 
10) based on one training subset of DB2. A pooled analysis of the 10 logistic regressions 
revealed that the probability, P, that dust — 1 is 

logit (P) = —3.5 + 19.07xi + 2.65x2 + 3.4x3 + 1.85x4. (2) 

The threshold P = 0.47 for the classification of DD was determined after examining the 
ROC curves that were produced for each of the 10 training sets. The area under these ROC 
curves ranged between 0.81-0.86, with a mean area of 0.83 (Figure 6). 
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Table 6. Results of a typical logistic regression (out of 10 independent runs) 
using a binary training subset of DB2. 



Coefficient 

Standard error 

t- Value 

P 

Intercept 

-3.50 

0.27 

-12.91 

<0.001 

Xi 

19.07 

1029.10 

0.02 

0.98 

X 2 

2.65 

0.74 

3.58 

<0.001 

x 3 

3.40 

0.37 

9.20 

<0.001 

X4 

1.85 

0.56 

3.32 

<0.001 
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Figure 6. ROC curves of all 10 training sets of the logistic regression (Equation (2)). The black dot 
marks the mean true-positive (TP) and false-positive (FP) rates and corresponds to the mean optimal 
threshold value (0.47). 


Based on the ground PMi 0 observations, the true fraction of DD in the study period was 
~9%. Since CART used this information throughout the construction of DB2, the naive 
prediction that all the days are NDD would have been wrong for ~9% of the days. Table 7 
displays a summary of the logistic regression model validation results. The prediction error 
has been calculated using equal fines for misclassification of either DD (i.e. false-negative) 
or NDD (i.e. false-positive). The prediction error represents the fraction of the observations 
that were wrongly classified during the evaluation procedure, with B and C representing 
the observations that were classified incorrectly as NDD and DD, respectively (A and D 
represent the correctly classified observations). As can be seen, the mean prediction error 
of the logistic regression model is 8.2%, with 94.7% and 61.5% correct classifications of 
NDD and DD, respectively. In fact, in 90% of the evaluation runs, the prediction error was 
smaller than 9% (the naive prediction). This demonstrates that the CART-logistic regres- 
sion integrated model can predict well the occurrence of ground-level-perceived DD using 
the SRS aerosol products as model variables. 

Since ambient RH was shown to differ significantly between DD and NDD (Table 2), 
we explored, following all the steps that were described earlier, whether incorporation of 
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Table 7. Statistics of the prediction error and the confusion matrix 
as obtained in the evaluation of the logistic regression model. 


True 

Predicted 

NDD (0) 


DD (1) 

NDD (0) 

A 


B 

DD (1) 

C 


D 

Total 

133 


13 


Prediction error 



( B+c ) 
\A+B+C+DJ 

A 

B 

C 

D 

Minimum 

0.062 

120 

3 

4 

6 

First quartile 

0.070 

124 

3 

5 

7 

Median 

0.082 

126 

5 

7 

9 

Mean 

0.082 

126 

5 

7 

8 

Third quartile 

0.082 

128 

6 

9 

10 

Maximum 

0.140 

129 

7 

13 

10 


Note: Both model predictions and true observations are binary classified as 
NDD(O) or DD (1). 


Table 8. Statistics of the prediction error and the confusion matrix of the alternative logistic 
regression model (i.e. with ground RH records included) as obtained in the evaluation process. 


Prediction error { A+ %£ +D ) A 

B 

C 

D 

Minimum 

0.027 

123 

0 

4 

6 

First quartile 

0.063 

125 

2 

6 

9 

Median 

0.072 

126 

4 

7 

10 

Mean 

0.068 

126 

3 

7 

10 

Third quartile 

0.081 

127 

4 

8 

11 

Maximum 

0.089 

129 

7 

10 

13 


RH data into DB1 can improve the model predictions. As expected, the RH was found 
by CART to be a significant explanatory variable. An error analysis of this alternative 
model (Table 8) shows a mean prediction error of 6.8%, with 94.7% and 76.9% correct 
classifications of NDD and DD, respectively. Accounting for ground RH records improved 
the model’s DD classification power by 1 5.4% (i.e. a 25% improvement relative to the base 
model) and yielded 93.2% correct classifications (the mean area under the ROC curves was 
0.91, indicating a better model). All of the prediction errors during the evaluation runs of 
this alternative model were smaller than the baseline probability of DD (9%). 


5. Discussion 

The CART model results reveal that the most significant SRS aerosol product predictors 
of DD in southern Israel are the AOD and the AAI in agreement with Baddock, Bullard, 
and Bryant’s (2009) findings in Australia. Incorporation of ambient RH data was found to 
significantly improve DD detection. Hence, if available, ambient RH data are recommended 
as an additional model variable, besides the month, the AOD, and the AAI. However, it 
should be emphasized that due to lack of dense spatial coverage of ground RH monitoring 
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in the study area, possible spatial RH variations could not be accounted for. Therefore, the 
model results presented in Table 8 need to be re-evaluated in areas with enhanced spatial 
coverage of RH measurements. 

The AE was shown to significantly differ between DD and NDD (Table 3). Nonetheless, 
it did not turn out to be an important explanatory variable by the CART model. In contrast, 
the AAI differed only marginally between DD and NDD (Table 3), yet it was found by the 
CART to be a significant explanatory variable for DD classification. This probably results 
from the fact that the AE is derived from AOD retrievals at 470 and 670 nm; hence, the AE 
and the AOD are not independent of each other (e.g. retrieval of AE is not robust for low 
AOD). In contrast, since the AOD and the AAI represent distinct spectral signatures, they 
actually represent independent information and, thus, both were found useful for identifi- 
cation of DD. Whereas the information that the AAI carries in relation to DD identification 
may not be strong enough by itself (Table 3), when coupled with the information carried 
by the AOD, it has merit, as is evident from the CART output. In fact, the results of the 
CART model (Table 5) demonstrate that the optimal factors correspond to combinations of 
SRS aerosol products. These mixed factors are better predictors of DD than the factors that 
correspond to any single SRS aerosol product, very much like eigenvectors obtained by a 
principal component analysis. 

To further assess the model, we evaluated it for the same years (2007-2008) using data 
from a different geographical area (Gush Dan, central Israel), which is nonetheless expected 
to be affected by the same dust events. Specifically, the model that has been developed and 
parameterized for southern Israel was applied for central Israel using SRS aerosol products 
over this region. To assess the model performance, its predictions were compared to a DD 
classification obtained using the modified Ganor scheme (see Section 3) by taking into 
account the ground PM 10 data from stations in the Gush Dan area. Table 9 reveals that 
the model performed equally well in the two geographical areas. A similar mean correct 
classification of DD and NDD was obtained in both regions for the matched cases (B and 
E, D and F). It should be also further emphasized that, if available, ground RH data and 
prior information on the long-term DD/NDD ratio improve the model prediction power. 
Nonetheless, applying the model in regions other than the geographical area for which it 
has been developed should be done with care. 


Table 9. Comparison of model evaluation results for southern and central Israel. 



Area 

Ground RH 

Prior information 
(DD/NDD ratio) 

Mean correct 
classification (%) 

DD 

NDD 

A 

South Israel 

— 

+ 

61.5 

94.7 

B 

South Israel 

+ 

+ 

76.9 

94.7 

C 

South Israel 

— 

— 

23.1 

100 

D 

South Israel 

+ 

— 

30.7 

100 

E 

Gush Dan 

+ 

+ 

71.2 

89.5 

F 

Gush Dan 

+ 

- 

39.0 

97.4 


Notes: The models were developed and parameterized for southern Israel and either make use (+) or do not make 
use (-) of ground RH data and of prior information on the long-term DD/NDD ratio. According to the modified 
Ganor scheme for DD classification (see Section 3), in the years 2007-2008, southern Israel experienced 67 DD 
and 664 NDD, whereas central Israel experienced 64 DD and 666 NDD (the difference in the total number of 
days is due to data availability). 
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6. Conclusions 

This work demonstrates the possibility to identify DD and NDD retrospectively using an 
objective statistical model with SRS aerosol products as input. The variables of the logistic 
regression prediction model were objectively selected by a CART model. The optimal vari- 
ables of the logistic regression model are different combinations of the primary variables: 
the calendar month, the AOD, and the AAI. When the ground RH data are available and/or 
when the long-term DD/NDD ratio is known, including these data improves the model 
performance significantly. It is noteworthy that the hybrid nature of the factors that were 
selected by the CART as input variables to the logistic regression model, rather than using 
the individual physical variables by themselves as often done when regressing SRS aerosol 
products against ground PM, improves the overall model prediction. 

We believe that a reliable partitioning of days into DD and NDD as perceived at the 
ground can be valuable in many research areas, particularly for environmental health stud- 
ies where the impact of exposure to dust and non-dust PM may have distinct effects on our 
health. Indeed, the literature reveals a plethora of associations (in terms of relative risks) 
between a spectrum of adverse health effects and exposure to PM of mineralogical com- 
position or from combustion processes. The approach proposed in this study may enable 
identification of two subpopulations of days, which can be used to study the pertinent 
effects of exposure to PM from distinct sources. Thus, this approach may possibly be used 
to develop more advanced epidemiological models and to enhance our understanding of 
health effects of particles of distinct composition. In particular, due to the vast spatial avail- 
ability of SRS aerosol products, such a model may open the way to perform epidemiological 
studies in areas with limited ground air quality monitoring. 
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